Comparing machine learning algorithms to predict COVID‑19 mortality using a dataset including chest computed tomography severity score data

Since the beginning of the COVID-19 pandemic, new and non-invasive digital technologies such as artificial intelligence (AI) had been introduced for mortality prediction of COVID-19 patients. The prognostic performances of the machine learning (ML)-based models for predicting clinical outcomes of COVID-19 patients had been mainly evaluated using demographics, risk factors, clinical manifestations, and laboratory results. There is a lack of information about the prognostic role of imaging manifestations in combination with demographics, clinical manifestations, and laboratory predictors. The purpose of the present study is to develop an efficient ML prognostic model based on a more comprehensive dataset including chest CT severity score (CT-SS). Fifty-five primary features in six main classes were retrospectively reviewed for 6854 suspected cases. The independence test of Chi-square was used to determine the most important features in the mortality prediction of COVID-19 patients. The most relevant predictors were used to train and test ML algorithms. The predictive models were developed using eight ML algorithms including the J48 decision tree (J48), support vector machine (SVM), multi-layer perceptron (MLP), k-nearest neighbourhood (k-NN), Naïve Bayes (NB), logistic regression (LR), random forest (RF), and eXtreme gradient boosting (XGBoost). The performances of the predictive models were evaluated using accuracy, precision, sensitivity, specificity, and area under the ROC curve (AUC) metrics. After applying the exclusion criteria, a total of 815 positive RT-PCR patients were the final sample size, where 54.85% of the patients were male and the mean age of the study population was 57.22 ± 16.76 years. The RF algorithm with an accuracy of 97.2%, the sensitivity of 100%, a precision of 94.8%, specificity of 94.5%, F1-score of 97.3%, and AUC of 99.9% had the best performance. Other ML algorithms with AUC ranging from 81.2 to 93.9% had also good prediction performances in predicting COVID-19 mortality. Results showed that timely and accurate risk stratification of COVID-19 patients could be performed using ML-based predictive models fed by routine data. The proposed algorithm with the more comprehensive dataset including CT-SS could efficiently predict the mortality of COVID-19 patients. This could lead to promptly targeting high-risk patients on admission, the optimal use of hospital resources, and an increased probability of survival of patients.


Methods
Dataset description. In this study, a COVID-19 hospital-based registry database was retrospectively reviewed from February 9, 2020, to December 20, 2020. This dataset included the data of the patients referred to Ayatollah Talleghani Hospital (COVID-19 referral centre), Abadan city, Iran.
A total of 6854 suspected cases had been referred to the hospital's ambulatory and emergency departments (EDs), of whom 1853 cases were introduced as positive RT-PCR COVID-19, 2472 as negative, and 2529 as unspecified.
In the COVID-19 hospital-based registry database, seventy-two primary features in six main classes including patient's demographics (eight features), clinical features (21 features), history of personal diseases/comorbidity (13 features), laboratory results (28 features), CT-SS (one feature), and an output variable (0: survived and 1: deceased) had been registered for COVID-19 patients. Primary features registered in the COVID-19 hospitalbased registry database are listed in Table 1. Numerical parameters were quantitatively measured and nominal parameters were registered as Yes or No. In this database, demographic information of patients and their history of personal diseases/comorbidity were registered from the medical records or by asking the patient and the patient's companions. For each patient, the clinical features including cough, fever, shortness of breath, loss of smell, loss of taste, etc. were registered at the time of admission. In the first 24 h hospitalization of the patients, their blood and urine samples were analyzed and they were subjected to chest CT imaging. The laboratory results were automatically registered in their medical records.
Chest CT scores quantify the severity of pulmonary involvement in CT images. For each patient, five lung lobes were visually scored as 0 (no involvement), 1 (less than 5% involvement), 2 (5-25% involvement), 3 (25-50% involvement), 4 (50-75% involvement), and 5 (75-100% involvement). The total CT-SS is the sum of the individual lobar scores and ranges from 0 to 25. All CT images were separately reviewed by two radiologists. Any disagreements were resolved through consulting with an attending radiologist with 23 years of experience.
Data pre-processing. Data pre-processing is an imperative step to address irrelevant, redundant, and unreliable data and it could significantly resolve inconsistencies 20 . In this paper, data pre-processing was performed before the training of the ML models. First, records with more than 70% of missing data were excluded from the dataset. The remaining missing values of continuous and discrete variables were imputed by mean and mode values, respectively. Noisy and abnormal values, errors, and meaningless data were addressed by an expert panel including one health information management expert (HKA), two infectious diseases specialists, and two haematologists.
The positive RT-PCR COVID-19 cases were only entered into the study. Negative RT-PCR COVID-19 test, unknown dispositions, discharge or death from the emergency department, missing data > 70%, and age lower www.nature.com/scientificreports/ than 18 years old were the study exclusion criteria. Figure 1 depicted the schematic of the study inclusion and exclusion criteria. After applying the inclusion/exclusion criteria, the final sample size was 815 patients. This dataset contains 707 and 108 cases in the survival and death groups, respectively. This imbalanced input would cause delivering biased results toward the dominant class.
The problem of the imbalanced dataset was dealt with using the synthetic minority over-sampling technique (SMOTE) method (https:// imbal anced-learn. org/ stable/). SMOTE algorithm is the most frequently employed synthetic oversampling which creates synthetic samples of the minority class using randomly selected instances of the minority class and their k nearest neighbors 21 . In this method, a random data instance with its k nearest neighbors are selected. Then, the second data instance would be selected from the k nearest neighbors set. The new synthetic sample is generated along the line joining the two samples as a convex combination. This procedure would be repeated until the minority class is balanced with the majority one 22 . Unlike the random oversampling method, the risk of overfitting was avoided in SMOTE method and this method can yield relatively better results 23 . Feature selection. The feature selection process is widely used in data mining to determine the most important variables highly correlated with the output variable 24 . One of the main advantages of using this method is www.nature.com/scientificreports/ to prevent overfitting of the ML algorithms 25 . In this study, the most important variables for mortality prediction of COVID-19 were determined using XGBoost, random forest, and Chi-squared tests. The chi-squared test evaluates the statistical differences in the parameters between the deceased and survived groups. The importance scores of the predictors calculated using XGBoost and random forest tests are depicted in Fig. 2. In all feature selection methods, a high score was achieved for strong predictors such as CT-SS, WBC, serum creatinine, etc. But, there were significant discrepancies in the importance scores calculated using XGBoost and random forest tests for some parameters. The dialysis history of the patient has moderate importance in the XGBoost method and the random forest algorithm assigned low importance to it. There was no statistically significant difference in the dialysis history of the patient between the deceased and survived cases (P = 0.011). In another hand, phosphorus concentration in blood samples has low and moderate importance scores in XGBoost and random forest methods, respectively. A strong predictor must first have a statistically significant difference between the deceased and survived cases in order to predict the mortality of COVID-19 patients correctly. According to the observed discrepancies and to determine the predictors which have significant differences between the deceased and survived cases, the independence test of Chi-square was used to determine the most important variables in the mortality prediction of COVID-19 patients. The predictors selected by the Chi-square test had moderate to high importance scores in XGBoost and random forest methods. It is worth mentioning that predictors such as CT-SS have high importance scores in both XGBoost and random forest tests and they have a significant statistical difference between the deceased and survived cases (P < 0.001). The SPSS software (version 23) was used to calculate the Chi-square coefficients and P < 0.01 was regarded as the significant level.
Model development. In this study, the predictive models were developed using eight ML algorithms including the J48 decision tree (J48), support vector machine (SVM), multi-layer perceptron (MLP), k-nearest neighbourhood (k-NN), Naïve Bayes (NB), logistic regression (LR), random forest (RF), and eXtreme gradient boosting (XGBoost) 22 . These mortality prediction models were implemented using Waikato Environment for Knowledge Analysis (Weka) software (version 3.9.2, University of Waikato, New Zealand). The k-fold cross-validation method was used in the performance evaluation of the developed classifiers. The k-fold cross-validation method has a relatively low level of bias and variation, which makes it a preferred technique. The parameters of the selected ML algorithms for COVID-19 mortality prediction are described in Table 2.
The performances of the predictive models were evaluated using accuracy, precision, sensitivity, specificity, and area under the ROC curve (AUC) metrics. These performance metrics were compared for all ML algorithms to determine the best model for mortality prediction of COVID-19 patients.    6,14,16,17,29,30,36 , headache 6,11,17,26,30,31,37 , platelet count 6,14,28,29 , and alanine aminotransferase (ALT) 6,14,29,31 were the irrelevant features in predicting COVID-19 mortality. Despite the clinical importance of these parameters for treatment success and mortality prediction, many of them could be eliminated from ML analysis and mortality prediction would be performed with fewer factors and the same accuracy.
Evaluation of the developed models. In this study, COVID-19 mortality prediction models were developed using eight ML algorithms including J48, SVM, MLP, k-NN, NB, LR, RF, and XGBoost. These predictive models were built using the best feature subset determined in the previous step. The ML algorithms were trained using the same dataset. The performances of these models were evaluated using sensitivity, specificity, accuracy, precision, and AUC metrics. Results of the performance evaluation for the developed models are listed in Table 5.
Results showed that the RF algorithm yielded better performance to predict the mortality of COVID-19 patients than other ML algorithms. The sensitivity, specificity, accuracy, precision, F1-score, and AUC of the RF algorithm were 100.0%, 94.5%, 97.2%, 94.8%, 97.3%, and 99.9%, respectively. Figure 3 depicted the comparison of the area under the ROC curve for the developed ML algorithms.  www.nature.com/scientificreports/

Discussion
With the COVID-19 outbreak, the global health system had been faced a life-threatening infection with a wide range of symptoms and complications. For appropriate preparedness against to ongoing global pandemic, it is important to implement intelligent-based models for predicting which patients are at high risk for disease progression and poor outcomes. Timely and accurate identification of COVID-19 patients with poor outcomes can guide physicians in selecting appropriate treatment and allocating limited hospital resources. AI has created remarkable opportunities to determine the best models for diagnosis, risk analysis, screening, and prediction in response to the challenges ahead of the healthcare system. AI-based classification of chest scanning images for the automatic diagnosis of COVID-19 was evaluated by Jyoti et al. and Goel et al. 39,40 . The accuracy of MCA-inspired TQWT-based classification of chest X-ray images to the automatic diagnosis of COVID-19 was 98.82% and 94.64% for small and large datasets, respectively 39 . The AI system achieved an AUC of 0.92 to screen and detect COVID-19. Its diagnostic sensitivity is equal to a senior thoracic radiologist and for the patients with positive RT-PCR results and normal CT scans, the developed AI model improved the diagnosis of patients while the radiologist had reported them as COVID-19 negative 41 .
In Asteris et al. study 42 , AI approaches were used for early prediction of COVID-19 outcomes. They predicted intensive care unit (ICU) hospitalization of COVID-19 patients using artificial neural networks (ANN). Laboratory parameters of the adult patients were used to develop this predictive model. The accuracy, precision, sensitivity, and F1-score of the ANN for the validation cohort were 95.97%, 90.63%, 93.55%, and 92.06%, respectively. www.nature.com/scientificreports/ Their study showed that an AI-based predictive model could accurately predict ICU hospitalization using only 5 laboratory indices at the time of admission. These studies showed that AI could solve several issues affecting the diagnosis and prediction of COVID-19.
In the present study, we retrospectively analysed the data from a hospital-based registry database to develop and evaluate ML models capable of predicting the risk of COVID-19 mortality. First, demographic information, risk factors, clinical manifestations, laboratory results, and imaging findings were examined to identify the most relevant predictors for mortality prediction of COVID-19 patients. The selected set of the most relevant predictors was used to train and test ML algorithms. In our study, eight ML algorithms including the J48 decision tree, k-NN, MLP, SVM, XGBoost, NB, RF, and LR were used to develop the prediction models based on a dataset of laboratory-confirmed COVID-19 hospitalized patients. The results showed that RF with an accuracy of 97.2%, sensitivity of 100%, precision of 94.8%, specificity of 94.5%, F1-score of 97.3%, and AUC of 99.9% had the best performance among the other ML approaches. Decision tree, XGBoost, k-NN, and MLP models with AUC ≥ 93.9 showed good prediction performances in predicting COVID-19 mortality. Although other ML algorithms are categorized in the last ranks in terms of performance; they had also an acceptable performance (AUC ranged from 81.2 to 88.9%). The SVM model had the weakest performance among ML models (AUC = 81.2%).
The prognostic performances of ML techniques for the mortality prediction of COVID-19 patients have been evaluated in different studies. In Gao et al. study 14 , the mortality prediction of 2520 COVID-19 hospitalized patients was evaluated using LR, SVM, gradient boosted decision tree (GBDT), and neural network (NN) algorithms. For predicting COVID-19 patients' physiological deterioration and death up to 20 days, the neural network-based prediction model with an AUC of 97.60% had a better performance than LR, SVM, and GBDT algorithms.
In Zakariaee et al. study 6 , the prognostic significance of chest CT severity score in mortality prediction of COVID-19 patients was evaluated using k-NN, MLP, SVM, and J48 decision tree ML approaches. The retrospective analysis of the data of 815 COVID-19 hospitalized patients showed that the prognostic performances of the ML algorithms would improve by the integration of CT-SS data with demographics, risk factors, clinical manifestations, and laboratory parameters. SVM was the weakest method in predicting mortality and the k-NN model with an accuracy of 94.1%, sensitivity of 100. 0%, precision of 89.5%, specificity of 88.3%, and AUC of around 97.2% had better performance than MLP, SVM, and J48 decision tree algorithms.
The prognostic performances of decision tree (J48), MLP, k-NN, random forest (RF), and SVM data mining models were also evaluated by Moulaei et al. 16 . The ML algorithms were developed using demographics, risk factors, and clinical manifestations of 850 COVID-19 hospitalized patients. Although all ML algorithms had good prognostic performances for mortality prediction of COVID-19 patients (AUCs > 96%), the RF model yields the best prognostic results and SVM was the weakest method.
In another study by Moulaei et al. 17 , the mortality prediction for 1500 COVID-19 hospitalized patients was performed using the decision tree (J48), RF, k-NN, MLP, Naïve Bayes (NB), eXtreme gradient boosting (XGBoost), and logistic regression (LR) algorithms. The results demonstrated that the RF model with an accuracy of 95.03%, sensitivity of 90.70%, precision of 94.23%, specificity of 95.10%, and AUC of 99.02 had the best performance. The results of these studies were in close agreement with our findings. A summary of these studies is presented in Table 6. In this table, developed ML models, datasets, and prognostic performances of ML models to predict mortality in COVID-19 patients are listed.
These ML algorithms were mostly developed using demographics, risk factors, clinical manifestations, and laboratory parameters. Their dataset has no clinical imaging data. Chest CT is one of the most common methods used to evaluate and diagnose patients with suspected SARS-CoV-2 infection 41 . The systematic review and metaanalysis of chest CT manifestations in COVID-19 patients indicated that vascular enlargement, ground-glass opacities (GGOs), subpleural bands, and interlobular septal thickening were typical CT features of COVID-19 patients. These common patients are less likely to have radiographic abnormalities with over two lobes involved compared to severe patients. For severe patients, vascular enlargement, GGOs, interlobular septal thickening, air bronchogram, consolidation, subpleural bands, crazy-paving pattern, and traction bronchiectasis were the predominant CT features; and traction bronchiectasis, consolidation, interlobular septal thickening, crazypaving pattern, reticulation, pleural effusion, and lymphadenopathy were related parameters to the severity of the disease 43 . The severity of pulmonary involvement on CT scans is significantly associated with mortality of COVID-19 patients (OR = 7.124 (95% CI 5.307-9.563) 19 and it could predict the patient mortality with a sensitivity of 0.67 [95%CI (0.59-0.75)] and specificity of 0.79 [95%CI (0.74-0.84)] 1 .
In our study, the importance and efficiency of CT-SS to predict COVID-19 mortality were evaluated using three feature selection methods including XGBoost, random forest, and Chi-squared tests. Our results showed that CT-SS is one of the most important and relevant parameters to predict mortality risk in COVID-19 patients. A high importance score was observed for CT-SS in both XGBoost and random forest tests. In this study, similar to previous studies, deceased patients had higher CT-SSs and there was a significant statistical difference between the deceased and survived cases (P < 0.001). These findings indicate that CT-SS is a strong predictor to predict mortality risk in COVID-19 patients. Thus, the integration of this predictor with demographics, risk factors, clinical manifestations, and laboratory parameters, would improve prognostic performances of the ML algorithms for mortality prediction of COVID-19 patients.
These observations showed ML models are a valuable tool for making reliable clinical decisions and achieving evidence-based patient management to improve patient outcomes and the quality of medical care. The RF predictive models with the more comprehensive dataset including CT-SS could efficiently predict the mortality of COVID-19 patients. This could lead to the optimal use of hospital resources and an increased probability of survival of patients. www.nature.com/scientificreports/ www.nature.com/scientificreports/ Limitations. This study had some limitations. First, the predictive performances of the ML models to predict the mortality of COVID-19 patients were not evaluated in a prospective cohort due to the retrospective nature of the study. Second, this is a single-centre study and patients included were primarily local residents from Abadan, Iran. External validation of the proposed model merits future investigations on bigger and multicentre databases.

Conclusions
In this study, we compared the prognostic performances of the J48 decision tree, k-NN, MLP, SVM, XGBoost, NB, RF, and LR algorithms for mortality prediction of COVID-19 patients using a more comprehensive collection of features including CTSS data, demographics, risk factors, clinical manifestations, and laboratory findings. Results showed that timely and accurate risk stratification of COVID-19 patients could be performed using ML-based predictive models fed by routine data. The RF predictive model with a comprehensive collection of predictors could lead to promptly targeting high-risk patients on admission and therefore it would improve patient survival probability.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.